Lemma selection in domain specific computational lexica - some specific problems
نویسنده
چکیده
This paper describes the lemma selection process of a Danish computational lexicon, the STO project, for domain specific language and focuses on some specific problems encountered during the lemma selection process. After a short introduction to the STO project and an explanation of why the lemmas are selected from a corpus and not chosen from existing dictionaries, the lemma selection process for domain specific language is described in detail. The purpose is to make the lemma selection process as automatic as possible but a manual examination of the final candidate lemma lists is inevitable. The lemmas found in the corpora are compared to a list of lemmas of general language, sorting out lemmas already encoded in the database. Words that have already been encoded as general language words but that are also found with another meaning and perhaps another syntactic behaviour in a specific domain should be kept on a list and the paper describes how this is done. The recognition of borrowed words the spelling of which have not been established constitutes a big problem to the automatic lemma selection process. The paper gives some examples of this problem and describes how the STO project tries to solve it.
منابع مشابه
Morphology Based Automatic Acquisition of Large-coverage Lexica
In this article, we introduce a new technique for constructing wide-coverage morphological lexica from large corpora and morphological knowledge, with an application to French. Basically, it relies on the idea that the existence of a hypothetical lemma can be guessed if several different words found in the corpus are best interpreted as morphological variants of this lemma. We first validated o...
متن کاملAffect Proxies and Ontological Change: A Finance Case Study
Traditional sentiment analysis has been focusing on inference of the sentiment polarity using sentiment-bearing words. In this paper, we propose a new way of studying sentiment and capturing ontological changes in a domain specific context in the perspective of computational linguistics using affect proxies. We used Nexis service to create a domain specific corpus focusing on banking sectors. W...
متن کاملEnriching Morphological Lexica through Unsupervised Derivational Rule Acquisition
In a morphological lexicon, each entry combines a lemma with a specific inflection class, often defined by a set of inflection rules. Therefore, such lexica usually give a satisfying account of inflectional operations. Derivational information, however, is usually badly covered. In this paper we introduce a novel approach for enriching morphological lexica with derivational links between entrie...
متن کاملتعیین اپی توپ های ناپیوسته زنجیره سبک ایمونوگلوبولین انسان توسط ایمونولوژی محاسبه ای
Background: Immunoglobulins are a group of proteins that have important role in defense against microorganisms. Immunoglobulins consist of heavy and light chains. In human, immunoglobulin light chain comprises of two isotypes: Kappa (K) and lambda (λ) based on amino acid differences in carboxylic end of their constant region. Marked changes in the K to λ ratio can happen in monocl...
متن کاملGENERAL SOLUTION OF ELASTICITY PROBLEMS IN TWO DIMENSIONAL POLAR COORDINATES USING MELLIN TRANSFORM
Abstract In this work, the Mellin transform method was used to obtain solutions for the stress field components in two dimensional (2D) elasticity problems in terms of plane polar coordinates. the Mellin transformation was applied to the biharmonic stress compatibility equation expressed in terms of the Airy stress potential function, and the boundary value problem transformed to an algebraic ...
متن کامل